Prototyping Modules for Shiny App for the Geospatial Analytics Project

Author

Santhya Selvan

Published

October 23, 2024

Modified

November 3, 2024

10.0 Overview

Our project aims to uncover the rate and distribution of different types of crimes in our neibhouring nation, Malaysia, using the following analysis methods: Exploratory Data Analysis, Exploratory Spatial Data Analysis and Clustering Analysis. This will help travellers and tourists visiting Malaysia to be more informed on their choice of vacation spots. It could also help policy makers decide where to focus their efforts in handling different types of crimes.

10.0.1 My Responsibilities

My responsibilities include:

  • Data Preparation

  • Basic EDA plots of Peninsular Malaysia

  • ESDA Measures on Peninsular Malaysia

  • Hierarchical Clustering on Peninsular Malaysia, based on type of crime

10.1 Getting Started

10.1.1 Loading in the packages

For our project, I will be needing a variety of packages, namely:

sf: Used in spatial data wrangling

tidyverse: Used in data wrangling for non-spatial data

tmap: For functions relating to mapping point patterns

sfdep: Functions that support Exploratory Data Analysis and is compatible with the sf and tidyverse packages

corrplot: package to help plot correlation matrices.

cluster, NbClust: packges that will be helpful for our hierarchical clustering needs.

ggpubr, GGally: packages extended on/ based on ggplot2. They provide functions for plots.

pacman::p_load(spdep, sfdep, tmap, sf, ClustGeo, ggpubr, cluster, factoextra, NbClust, heatmaply, corrplot, psych, tidyverse, GGally)

10.1.2 Loading in the datasets

For our project, we will be using three datasets:

  • crime_district.csv is a csv file containing data on crimes in Malaysia at the district level

  • population_district.csv is a csv file containing data of the population of Malaysia, also at the district level

  • Malaysia Admin 2 Boundary in shapefile format.

I will be loading them into the R environment using read_csv() function of the readr package for the aspatial data and the st_read() function of the sf package for the geospatial data.

crime_district <- read_csv("data/aspatial/crime_district.csv")
Rows: 19152 Columns: 6
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): state, district, category, type
dbl  (1): crimes
date (1): date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
population_district <- read_csv("data/aspatial/population_district.csv")
Rows: 319200 Columns: 7
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr  (5): state, district, sex, age, ethnicity
dbl  (1): population
date (1): date

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
msia_adm2_sf <- st_read(dsn = 'data/geosp', layer = 'mys_admbnda_adm2_unhcr_20210211')
Reading layer `mys_admbnda_adm2_unhcr_20210211' from data source 
  `C:\santhyats\IS415-GAA\Take-Home_Exercises\Take-Home_Ex03\data\geosp' 
  using driver `ESRI Shapefile'
Simple feature collection with 144 features and 14 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: 99.64072 ymin: 0.855001 xmax: 119.2697 ymax: 7.380556
Geodetic CRS:  WGS 84

10.1.3 Data Wrangling

The crime dataset has rows that have crime data aggregated across all the districts, and across all the crime types. It also contains rows where the data is aggregated over just the crime types. All of these rows are grouped by their dates.

Hence, firstly, I will be formatting the crime data to only include the state names and their respective districts, and remove all the rows that contain aggregated values. Since population data is from 2020-2024, I will only keep the data for the years 2020-2023. All of this will be done using the filter() function.

crime_district <- crime_district %>% 
    filter(state != "Malaysia") %>% 
    filter(category == "assault") %>% 
    filter(district != "All") %>% 
    filter(type != "all") %>% 
   filter(year(ymd(date)) >= 2020 & year(ymd(date)) <= 2023)

Next, I will format the population data. I will filter the dates to exclude 2024 since we do not have crime data for that year, and once again remove all the rows where data is aggregated over a certain characteristic.

library(dplyr)
library(lubridate)

# population_state_filtered <- 
pop_data <- population_district %>%
    filter(!year(ymd(date)) == 2024) %>% 
    filter(sex == 'both' | sex == 'overall' ) %>% 
    filter(age == 'overall') %>% 
    filter(ethnicity == 'overall') %>% 
    select(-c(4:6))

As for the boundary sf, I will drop all the columns that are not needed.

adm2_sf <- msia_adm2_sf %>% select(-c(3:5, 8,9,11,12))

Next, I used the unique() function on the district columns of all the sf dataframes to see the discrepancies in the district names and district data.

unique(pop_data$district)
  [1] "Batu Pahat"             "Johor Bahru"            "Kluang"                
  [4] "Kota Tinggi"            "Kulai"                  "Mersing"               
  [7] "Muar"                   "Pontian"                "Segamat"               
 [10] "Tangkak"                "Baling"                 "Bandar Baharu"         
 [13] "Kota Setar"             "Kuala Muda"             "Kubang Pasu"           
 [16] "Kulim"                  "Langkawi"               "Padang Terap"          
 [19] "Pendang"                "Pokok Sena"             "Sik"                   
 [22] "Yan"                    "Bachok"                 "Gua Musang"            
 [25] "Jeli"                   "Kecil Lojing"           "Kota Bharu"            
 [28] "Kuala Krai"             "Machang"                "Pasir Mas"             
 [31] "Pasir Puteh"            "Tanah Merah"            "Tumpat"                
 [34] "Alor Gajah"             "Jasin"                  "Melaka Tengah"         
 [37] "Jelebu"                 "Jempol"                 "Kuala Pilah"           
 [40] "Port Dickson"           "Rembau"                 "Seremban"              
 [43] "Tampin"                 "Bentong"                "Bera"                  
 [46] "Cameron Highlands"      "Jerantut"               "Kuantan"               
 [49] "Lipis"                  "Maran"                  "Pekan"                 
 [52] "Raub"                   "Rompin"                 "Temerloh"              
 [55] "Bagan Datuk"            "Batang Padang"          "Hilir Perak"           
 [58] "Hulu Perak"             "Kampar"                 "Kerian"                
 [61] "Kinta"                  "Kuala Kangsar"          "Larut Dan Matang"      
 [64] "Manjung"                "Muallim"                "Perak Tengah"          
 [67] "Selama"                 "Perlis"                 "Barat Daya"            
 [70] "Seberang Perai Selatan" "Seberang Perai Tengah"  "Seberang Perai Utara"  
 [73] "Timur Laut"             "Beaufort"               "Beluran"               
 [76] "Kalabakan"              "Keningau"               "Kinabatangan"          
 [79] "Kota Belud"             "Kota Kinabalu"          "Kota Marudu"           
 [82] "Kuala Penyu"            "Kudat"                  "Kunak"                 
 [85] "Lahad Datu"             "Nabawan"                "Papar"                 
 [88] "Penampang"              "Pitas"                  "Putatan"               
 [91] "Ranau"                  "Sandakan"               "Semporna"              
 [94] "Sipitang"               "Tambunan"               "Tawau"                 
 [97] "Telupid"                "Tenom"                  "Tongod"                
[100] "Tuaran"                 "Asajaya"                "Bau"                   
[103] "Belaga"                 "Beluru"                 "Betong"                
[106] "Bintulu"                "Bukit Mabong"           "Dalat"                 
[109] "Daro"                   "Julau"                  "Kabong"                
[112] "Kanowit"                "Kapit"                  "Kuching"               
[115] "Lawas"                  "Limbang"                "Lubok Antu"            
[118] "Lundu"                  "Maradong"               "Marudi"                
[121] "Matu"                   "Miri"                   "Mukah"                 
[124] "Pakan"                  "Pusa"                   "Samarahan"             
[127] "Saratok"                "Sarikei"                "Sebauh"                
[130] "Selangau"               "Serian"                 "Sibu"                  
[133] "Simunjan"               "Song"                   "Sri Aman"              
[136] "Subis"                  "Tanjung Manis"          "Tatau"                 
[139] "Tebedu"                 "Telang Usan"            "Gombak"                
[142] "Klang"                  "Kuala Langat"           "Kuala Selangor"        
[145] "Petaling"               "Sabak Bernam"           "Sepang"                
[148] "Ulu Langat"             "Ulu Selangor"           "Besut"                 
[151] "Dungun"                 "Hulu Terengganu"        "Kemaman"               
[154] "Kuala Nerus"            "Kuala Terengganu"       "Marang"                
[157] "Setiu"                  "W.P. Kuala Lumpur"      "W.P. Labuan"           
[160] "W.P. Putrajaya"         "Cameron Highland"       "Sp Selatan"            
[163] "Sp Tengah"              "Sp Utara"              
unique(adm2_sf$ADM2_EN)
  [1] "Batu Pahat"        "Johor Bahru"       "Kluang"           
  [4] "Kota Tinggi"       "Kulaijaya"         "Ledang"           
  [7] "Mersing"           "Muar"              "Pontian"          
 [10] "Segamat"           "Baling"            "Bandar Baharu"    
 [13] "Kota Setar"        "Kuala Muda"        "Kubang Pasu"      
 [16] "Kulim"             "Langkawi"          "Padang Terap"     
 [19] "Pendang"           "Pokok Sena"        "Sik"              
 [22] "Yan"               "Bachok"            "Gua Musang"       
 [25] "Jeli"              "Kota Bharu"        "Kuala Krai"       
 [28] "Machang"           "Pasir Mas"         "Pasir Puteh"      
 [31] "Tanah Merah"       "Tumpat"            "WP. Kuala Lumpur" 
 [34] "W.P. Labuan"       "Alor Gajah"        "Jasin"            
 [37] "Melaka Tengah"     "Jelebu"            "Jempol"           
 [40] "Kuala Pilah"       "Port Dickson"      "Rembau"           
 [43] "Seremban"          "Tampin"            "Bentong"          
 [46] "Bera"              "Cameron Highlands" "Jerantut"         
 [49] "Kuantan"           "Lipis"             "Maran"            
 [52] "Pekan"             "Raub"              "Rompin"           
 [55] "Temerloh"          "Batang Padang"     "Hilir Perak"      
 [58] "Ulu Perak"         "Kampar"            "Kerian"           
 [61] "Kinta"             "Kuala Kangsar"     "Larut Dan Matang" 
 [64] "Manjung (Dinding)" "Perak Tengah"      "Perlis"           
 [67] "Barat Daya"        "S.P.Selatan"       "S.P. Tengah"      
 [70] "S.P. Utara"        "Timur Laut"        "Beaufort"         
 [73] "Beluran"           "Keningau"          "Kinabatangan"     
 [76] "Kota Belud"        "Kota Kinabalu"     "Kota Marudu"      
 [79] "Kuala Penyu"       "Kudat"             "Kunak"            
 [82] "Lahad Datu"        "Nabawan"           "Papar"            
 [85] "Penampang"         "Pitas"             "Putatan"          
 [88] "Ranau"             "Sandakan"          "Semporna"         
 [91] "Sipitang"          "Tambunan"          "Tawau"            
 [94] "Tenom"             "Tongod"            "Tuaran"           
 [97] "Asajaya"           "Bau"               "Belaga"           
[100] "Betong"            "Bintulu"           "Dalat"            
[103] "Daro"              "Julau"             "Kanowit"          
[106] "Kapit"             "Kuching"           "Lawas"            
[109] "Limbang"           "Lubok Antu"        "Lundu"            
[112] "Marudi"            "Matu"              "Meradong"         
[115] "Miri"              "Mukah"             "Pakan"            
[118] "Samarahan"         "Saratok"           "Sarikei"          
[121] "Selangau"          "Serian"            "Sibu"             
[124] "Simunjan"          "Song"              "Sri Aman"         
[127] "Tatau"             "Besut"             "Dungun"           
[130] "Hulu Terengganu"   "Kemaman"           "Kuala Terengganu" 
[133] "Marang"            "Setiu"             "W.P. Putrajaya"   
[136] "Gombak"            "Ulu Langat"        "Ulu Selangor"     
[139] "Klang"             "Kuala Langat"      "Kuala Selangor"   
[142] "Petaling"          "Sabak Bernam"      "Sepang"           
unique(crime_district$district)
  [1] "Batu Pahat"             "Iskandar Puteri"        "Johor Bahru Selatan"   
  [4] "Johor Bahru Utara"      "Kluang"                 "Kota Tinggi"           
  [7] "Kulaijaya"              "Ledang"                 "Mersing"               
 [10] "Muar"                   "Nusajaya"               "Pontian"               
 [13] "Segamat"                "Seri Alam"              "Baling"                
 [16] "Bandar Baharu"          "Kota Setar"             "Kuala Muda"            
 [19] "Kubang Pasu"            "Kulim"                  "Langkawi"              
 [22] "Padang Terap"           "Pendang"                "Sik"                   
 [25] "Yan"                    "Bachok"                 "Gua Musang"            
 [28] "Jeli"                   "Kota Bharu"             "Kuala Krai"            
 [31] "Machang"                "Pasir Mas"              "Pasir Puteh"           
 [34] "Tanah Merah"            "Tumpat"                 "Alor Gajah"            
 [37] "Jasin"                  "Melaka Tengah"          "Jelebu"                
 [40] "Jempol"                 "Kuala Pilah"            "Nilai"                 
 [43] "Port Dickson"           "Rembau"                 "Seremban"              
 [46] "Tampin"                 "Bentong"                "Bera"                  
 [49] "Cameron Highlands"      "Jerantut"               "Kuala Lipis"           
 [52] "Kuantan"                "Maran"                  "Pekan"                 
 [55] "Raub"                   "Rompin"                 "Temerloh"              
 [58] "Batu Gajah"             "Gerik"                  "Hilir Perak"           
 [61] "Ipoh"                   "Kampar"                 "Kerian"                
 [64] "Kuala Kangsar"          "Manjung"                "Pengkalan Hulu"        
 [67] "Perak Tengah"           "Selama"                 "Sungai Siput"          
 [70] "Taiping"                "Tanjong Malim"          "Tapah"                 
 [73] "Arau"                   "Kangar"                 "Padang Besar"          
 [76] "Barat Daya"             "Seberang Perai Selatan" "Seberang Perai Tengah" 
 [79] "Seberang Perai Utara"   "Timur Laut"             "Beaufort"              
 [82] "Beluran"                "Keningau"               "Kota Belud"            
 [85] "Kota Kinabalu"          "Kota Kinabatangan"      "Kota Marudu"           
 [88] "Kudat"                  "Kunak"                  "Lahad Datu"            
 [91] "Papar"                  "Penampang"              "Ranau"                 
 [94] "Sandakan"               "Semporna"               "Sipitang"              
 [97] "Tawau"                  "Tenom"                  "Tuaran"                
[100] "W.P. Labuan"            "Bau"                    "Belaga"                
[103] "Betong"                 "Bintulu"                "Dalat"                 
[106] "Julau"                  "Kanowit"                "Kapit"                 
[109] "Kota Samarahan"         "Kuching"                "Lawas"                 
[112] "Limbang"                "Lubok Antu"             "Lundu"                 
[115] "Marudi"                 "Matu Daro"              "Meradong"              
[118] "Miri"                   "Mukah"                  "Padawan"               
[121] "Saratok"                "Sarikei"                "Serian"                
[124] "Sibu"                   "Simunjan"               "Song"                  
[127] "Sri Aman"               "Tatau"                  "Ampang Jaya"           
[130] "Gombak"                 "Hulu Selangor"          "Kajang"                
[133] "Klang Selatan"          "Klang Utara"            "Kuala Langat"          
[136] "Kuala Selangor"         "Petaling Jaya"          "Sabak Bernam"          
[139] "Sepang"                 "Serdang"                "Sg. Buloh"             
[142] "Shah Alam"              "Subang Jaya"            "Sungai Buloh"          
[145] "Besut"                  "Dungun"                 "Hulu Terengganu"       
[148] "Kemaman"                "Kuala Terengganu"       "Marang"                
[151] "Setiu"                  "Brickfields"            "Cheras"                
[154] "Dang Wangi"             "Sentul"                 "W.P. Putrajaya"        
[157] "Wangsa Maju"           

As expected, all the dataframes contained different numbers of districts. There were also other issues such as differing spelling/naming conventions, splitting of districts into north/south regions, and districts missing from the boundary sf that were present in the aspatial datasets.

As little can be done in salvaging districts that were not present in the boundary layer, I decided to handle the other forms of discrepancies.

First, I will convert all the district names to lowercase letters to eliminate any errors on that front, and to standardise the naming conventions.

crime_district$district <- tolower(trimws(crime_district$district))
adm2_sf$ADM2_EN <- tolower(trimws(adm2_sf$ADM2_EN))
pop_data$district <- tolower(trimws(pop_data$district))

Since our main focus was crime data and it was the dataframe that will be mutually connected to both the boundary layer and the population data (phrase better), we will use the districts in the crime dataframe as a guide to formatting the other tables.

We first started off by handling the very obvious differences that could be observed through visual inspection as well as through the use of the unique function. We noticed that 2 districts, Johor Bahru and Klang, were split up in the crime dataframe. Hence, we merged these rows and summed up their crime number values.

str_detect() function of the stringr package was used to identify rows that contained the given input string. In our case, it was the district name. mutate() was then used to change the values of these rows. Since this produces a new dataframe, another dataframe was created with the rest of the values, before all three were bound together.

library(stringr)
crime_johor<- crime_district %>%
  filter(str_detect(district,"johor bahru")) %>%
  mutate(district = 'johor bahru') %>%
  group_by(state, district,type, date) %>%
  summarise(crimes = sum(crimes))
`summarise()` has grouped output by 'state', 'district', 'type'. You can
override using the `.groups` argument.
crime_klang<- crime_district %>%
  filter(str_detect(district,"klang")) %>%
  mutate(district = 'klang') %>%
  group_by(state, district,type, date) %>%
  summarise(crimes = sum(crimes))
`summarise()` has grouped output by 'state', 'district', 'type'. You can
override using the `.groups` argument.
other_rows <- crime_district %>% 
  filter(!str_detect(district, "johor bahru"))

crime_district <- bind_rows(crime_johor, crime_klang, other_rows) %>% 
  select(-c(6)) %>% 
  mutate(category = 'assault') 

Upon further examination, there were more discrepancies that I found so I sought to change them.

crime_district<- crime_district%>% 
  mutate(district = ifelse(district == "cameron highland", "cameron highlands", district)) %>% 
  mutate(district = ifelse(district == "kuala lipis", "lipis", district)) %>% 
  mutate(district = ifelse(district == "kota kinabatangan", "kinabatangan", district)) %>% 
  mutate(district = ifelse(district == "seberang perai tengah", "s.p. tengah", district)) %>% 
  mutate(district = ifelse(district == "seberang perai utara", "s.p. utara", district)) %>% 
  mutate(district = ifelse(district == "seberang perai selatan", "s.p.selatan", district)) %>% 
  mutate(district = ifelse(district == "petaling jaya", "petaling", district)) %>% 
   mutate(district = ifelse(district == "matu daro", "matu", district)) %>% 
  mutate(state = ifelse(district == "w.p. labuan", "Sabah", state))
   
  
  

pop_data <- pop_data %>% 
  mutate(district = ifelse(district == "kulai", "kulaijaya", district)) %>% 
  mutate(district = ifelse(district == "sp tengah", "s.p. tengah", district)) %>% 
  mutate(district = ifelse(district == "sp utara", "s.p. utara", district)) %>% 
  mutate(district = ifelse(district == "sp selatan", "s.p.selatan", district)) %>% 
  mutate(district = ifelse(district == "cameron highland", "cameron highlands", district))


adm2_sf <- adm2_sf %>% 
  mutate(ADM1_EN = ifelse(ADM2_EN == "w.p. labuan", "Sabah", ADM1_EN))
pop_crime<- left_join(pop_data, crime_district, by= "district")
Warning in left_join(pop_data, crime_district, by = "district"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 57 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
pop_crime_na<- pop_crime[rowSums(is.na(pop_crime)) > 0,]

drop_list_pop <- pop_crime_na$district
pop_data <- pop_data %>%
  filter(!(district %in% drop_list_pop))

pop_crime2<-left_join(crime_district, pop_data, by="district")
Warning in left_join(crime_district, pop_data, by = "district"): Detected an unexpected many-to-many relationship between `x` and `y`.
ℹ Row 1 of `x` matches multiple rows in `y`.
ℹ Row 2 of `y` matches multiple rows in `x`.
ℹ If a many-to-many relationship is expected, set `relationship =
  "many-to-many"` to silence this warning.
pop_crime_na2 <-pop_crime2[rowSums(is.na(pop_crime2)) > 0,]

drop_list_pop <- pop_crime_na2$district
crime_district <- crime_district %>%
  filter(!(district %in% drop_list_pop))

Next, I will left join the boundary and the population dataframes to the crime dataframe and vice versa, and remove all the rows that do not match.

adm_crime <- left_join(adm2_sf, crime_district, by= c("ADM2_EN" = "district"))
adm_crime_na <- adm_crime[rowSums(is.na(adm_crime)) > 0,]

drop_list <- adm_crime_na$ADM2_EN
adm2_sf <- adm2_sf %>% 
  filter(!(ADM2_EN %in% drop_list))

adm_crime2 <- left_join(crime_district, adm2_sf, by= c("district" = "ADM2_EN"))
adm_crime_na2 <- adm_crime2[rowSums(is.na(adm_crime2)) > 0,]

drop_list <- adm_crime_na2$district
crime_district <- crime_district %>% 
  filter(!(district %in% drop_list))

Finally, we then calculated a ratio that is representative of the crime rate in each district by dividing the number of crimes in each district by the population figures for that district and multiplying this by 1000. We then added a new column to the boundary sf and the crime_rate sf to indicate if the district belonged to the West or Peninsular region of Malaysia. Since the West and Peninsular regions are not connected by admin borders, we would have to process these regions separately for our analysis.

crime_distict <- crime_district %>%  ungroup()
rate_crime_district <- crime_district %>% 
    left_join(pop_data, by = c("district", "state", "date")) %>% 
    mutate(crime_rate = (crimes / population) * 1000)

# assigning Penisular , East region
rate_crime_district <- rate_crime_district %>% 
    mutate(region = case_when(
        state %in% c("Sabah", "Sarawak") ~ "East",
        TRUE ~ "Peninsular"
  ))
adm2_sf <- adm2_sf %>%
  mutate(region = case_when(
        ADM1_EN %in% c("Sabah", "Sarawak") ~ "East",
        TRUE ~ "Peninsular"))

crime_boundary <- left_join(crime_district, adm2_sf, by = c("district" = "ADM2_EN")) %>% 
  select(-c(7:12))
write_rds(crime_boundary, "data/rds/crime_boundary.rds")

crime_boundary<- crime_boundary %>% 
  ungroup()

rate_crime_district_bound <- left_join(rate_crime_district, adm2_sf, by = c("district" = "ADM2_EN")) %>% 
  select(-c(10:15))


pop_data <- pop_data %>% ungroup()
adm2_sf <- adm2_sf %>%  ungroup()
rate_crime_district <- rate_crime_district %>%  ungroup()
crime_distict <- crime_district %>%  ungroup()
rate_crime_district_bound<- rate_crime_district_bound %>%  ungroup()

Finally, I will write them into rds files for easier access.

write_rds(pop_data, "data/rds/pop_data.rds")
write_rds(crime_district, "data/rds/crime_district.rds")
write_rds(adm2_sf, "data/rds/adm2_sf.rds")
write_rds(rate_crime_district, "data/rds/rate_crime_district.rds")
write_rds(crime_boundary, "data/rds/crime_boundary.rds")
write_rds(rate_crime_district_bound, "data/rds/rate_crime_district_bounds.rds")
pop_data<- read_rds("data/rds/pop_data.rds")
crime_district <- read_rds("data/rds/crime_district.rds")
adm2_sf <- read_rds("data/rds/adm2_sf.rds")
rate_crime_district<-read_rds("data/rds/rate_crime_district.rds")
crime_boundary <- read_rds("data/rds/crime_boundary.rds")
rate_crime_district_bound <- read_rds("data/rds/rate_crime_district_bounds.rds")

10.2 Exploratory Data Analysis

10.2.1 Interactive Choropleth Map

For our EDA portion, I will be plotting an interactive choropleth map.

crime_boundary_west <- crime_boundary %>%  filter(region == "Peninsular")
crime_boundary_west <- st_as_sf(crime_boundary_west)
crime_boundary_west <- st_transform(crime_boundary_west, 3375)

This is the back-end code chunk used to generate the choropleth map. Functions of the tmap package was used to achieve this.

crime_types <- unique(crime_boundary_west$type)


for (crime in crime_types) {
  crime_data <- crime_boundary_west %>% filter(type == crime)

  tmap_mode("view") 
  map <- tm_shape(crime_data) +
    tm_polygons("crimes", title = paste("Crimes -", crime), 
                style = "quantile", palette = "Reds") +
    tm_layout(title = paste("Choropleth Map for", crime),
              legend.outside = TRUE)
  
  print(map)
}
tmap mode set to interactive viewing
tmap mode set to interactive viewing
tmap mode set to interactive viewing
tmap mode set to interactive viewing
tmap mode set to interactive viewing
tmap mode set to interactive viewing
tmap mode set to interactive viewing
10.2.1.1 Shiny Board

Users can input the type of crime, classification method, number of classes, colour scheme and the transparency of the map. As the user input the different fields, the interactive map will be automatically updated on the main panel.

Hovering over the areas will reveal the name of the district, while a click will reveal the raw crime numbers for the district. It is to be noted that these figures are aggregated figures over the 4-year time period of our dataset.

10.3 Exploratory Spatial Data Analysis (ESDA)

For ESDA, I will be focusing on the Global and Local Measures of Spatial Autocorrelation. Namely, I will be using Global Moran I’s Statistic for the Global Measure, and Local Moran I’s and Lisa Plots for the Local Measures.

10.3.1 Global Measures of Spatial Autocorrelation

Below is a sample code that was used in the back-end to generate the Global Moran I statistics. st_contiguity() and st_weight() functions of the sfdep package were used to generate the weights matrix for the districts. Subsequently, global_moran_test(), also from the same package, was used to generate the Global Moran I statistic. The results were saved in a dataframe format.

crime_data <- crime_boundary_west %>% 
   filter((year(ymd(date.x))) == 2020 & type == "murder") 

nb<- crime_data %>% 
  st_contiguity(crime_data$geometry, queen = TRUE)

nb[17]<- as.integer(19) #handle case for langkawi, which is not connected to the others by admin boundary
  
crime_data_wm <- crime_data %>% 
  mutate(nb = nb, .before = 1) %>% 
  mutate(wt = st_weights(nb, style = 'W'), 
         .before = 1) %>% 
  select(1,2)



global_moran <- global_moran_test(crime_data$crimes, 
             crime_data_wm$nb,
             crime_data_wm$wt)

moran_results <- data.frame(
  Statistic = "Global Moran's I",
  Moran_I = global_moran$estimate[1],        
  Expectation = global_moran$estimate[2],    
  Variance = global_moran$estimate[3],       
  P_value = global_moran$p.value            
)

moran_results
                         Statistic   Moran_I Expectation    Variance   P_value
Moran I statistic Global Moran's I 0.1829003 -0.01333333 0.006630579 0.0079786

Since the statistics were going to be published on a web app, I wanted to present them in a more user-friendly and polished format. I used the gt package, a package that specifically supports the creation of neater tables in R, to format our newly created dataframe of the Global Moran I Statistic in a neat table format.

library(gt)

moran_results_gt <- moran_results %>%
  gt() %>%
  tab_header(
    title = "Global Moran's I Test Results",
    subtitle = "Spatial Autocorrelation Analysis"
  ) %>%
  fmt_number(
    columns = c(Moran_I, Expectation, Variance, P_value),
    decimals = 7
  ) %>%
  cols_label(
    Statistic = "Statistic",
    Moran_I = "Moran's I",
    Expectation = "Expected Value",
    Variance = "Variance",
    P_value = "P-value"
  ) %>%
  tab_source_note(
    source_note = "Results generated using global_moran_test() function"
  )

moran_results_gt
Global Moran's I Test Results
Spatial Autocorrelation Analysis
Statistic Moran's I Expected Value Variance P-value
Global Moran's I 0.1829003 −0.0133333 0.0066306 0.0079786
Results generated using global_moran_test() function
10.3.1.1 Shiny Board

This is the view on the Shiny Dashboard. Users can input the year, type of crime, contiguity method, and their weights calculation style. The Global Moran I Statistics table will be generated when the user clicks the “Generate Statistics” button.

10.3.2 Local Measures of Spatial Autocorrelation

For the Local Measures of Spatial Autocorrelation, Local Moran I and LISA plots were generated. Functions of the sfdep and tmap packages were once again used to achieve this task. Since Langkawi is an island of Peninsular Malaysia, I manually set a neighbour for it. Below are sample code chunks used in the back-end for the Shiny App.

crime_data <- crime_boundary_west %>% 
   filter((year(ymd(date.x))) == 2022 & type == "rape") 

nb<- crime_data %>% 
  st_contiguity(crime_data$geometry, queen = TRUE)

nb[17]<- as.integer(19) #handle case for langkawi, which is not connected to the others by admin boundary
  
crime_data_wm <- crime_data %>% 
  mutate(nb = nb, .before = 1) %>% 
  mutate(wt = st_weights(nb, style = 'W'), 
         .before = 1) %>% 
  select(1,2)
lisa <- crime_data %>%
        mutate(local_moran = local_moran(crime_data$crimes, crime_data_wm$nb, crime_data_wm$wt, nsim = 99), .before = 1) %>%
        unnest(local_moran)
      
lisa <- lisa %>%
        rename("local moran(ii)" = "ii", "expectation (eii)" = "eii",
               "variance(var_ii)" = "var_ii", "std deviation(z_ii)" = "z_ii",
               "p_value" = "p_ii")

lisa_sig <- lisa  %>%
        filter(p_value < 0.95)  
tmap_mode("plot") +
tm_shape(lisa) + 
  tm_fill("local moran(ii)") +
  tm_borders(alpha = 0.5) + 
tm_view(set.zoom.limits = c(6,8)) +
  tm_layout(main.title = "2017",
            main.title.size= 1)
tmap mode set to plotting
Variable(s) "local moran(ii)" contains positive and negative values, so midpoint is set to 0. Set midpoint = NA to show the full spectrum of the color palette.

tmap_mode("plot")
tmap mode set to plotting
tm_shape(lisa) + 
  tm_polygons() +
  tm_borders(alpha = 0.5) +
  
  tm_shape(lisa_sig) +
  tm_fill(col = "mean",
          palette = "-RdBu") +
  tm_borders(alpha = 0.4)
Warning: One tm layer group has duplicated layer types, which are omitted. To
draw multiple layers of the same type, use multiple layer groups (i.e. specify
tm_shape prior to each of them).

10.3.2.1 Shiny Board

Users will be able to choose the year, crime type, contiguity method, weights calculation method and the number of simulations for the generation of the Local Moran I’s plot. As for the LISA plot, they will be able to choose the confidence level, classification type and the statistic. The maps will be updated with the click of the “Update Plot” button.

10.4 Clustering Analysis - Hierarchical Clustering

In order to derive the hierarchical cluster dendrograms, I will first need to reformat the data. I will convert the types of crimes to separate columns.

ci <- rate_crime_district_bound %>% 
  filter(type == "causing_injury") %>% 
  select(-c(3, 5:7, 11)) %>% 
  rename(`causing_injury`= `crime_rate`)

mr <- rate_crime_district_bound %>% 
  filter(type == "murder") %>% 
  select(-c(3, 5:7, 11)) %>% 
  rename(`murder`= `crime_rate`)

rp <- rate_crime_district_bound %>% 
  filter(type == "rape") %>% 
  select(-c(3, 5:7, 11)) %>% 
  rename(`rape`= `crime_rate`)

rga <- rate_crime_district_bound %>% 
  filter(type == "robbery_gang_armed") %>% 
  select(-c(3, 5:7, 11)) %>% 
  rename(`robbery_gang_armed`= `crime_rate`)

rgu <- rate_crime_district_bound %>% 
  filter(type == "robbery_gang_unarmed") %>% 
  select(-c(3, 5:7, 11)) %>% 
  rename(`robbery_gang_unarmed`= `crime_rate`)

rsa <- rate_crime_district_bound %>% 
  filter(type == "robbery_solo_armed") %>% 
  select(-c(3, 5:7, 11)) %>% 
  rename(`robbery_solo_armed`= `crime_rate`)

rsu <- rate_crime_district_bound %>% 
  filter(type == "robbery_solo_unarmed") %>% 
  select(-c(3, 5:7, 11)) %>% 
  rename(`robbery_solo_unarmed`= `crime_rate`)
rate_crime_prep <- ci%>% 
  mutate(murder = mr$murder, .before = 5) %>% 
  mutate(rape = rp$rape, .before = 6) %>% 
  mutate(robbery_gang_armed = rga$robbery_gang_armed, .before = 7) %>% 
  mutate(robbery_gang_unarmed = rgu$robbery_gang_unarmed, .before = 8) %>% 
  mutate(robbery_solo_armed = rsa$robbery_solo_armed, .before = 9) %>% 
  mutate(robbery_solo_unarmed = rsu$robbery_solo_unarmed, .before = 10)

write_rds(rate_crime_prep, "data/rds/rate_crime_prep.rds")
rate_crime_prep <- read_rds("data/rds/rate_crime_prep.rds")

I will then standardise the data in order to be used for our Heat Map Plot later on.

rate_crime_data <- rate_crime_prep %>% 
  filter(year(ymd(date.x)) == 2020)

rate_crime_data <- as.data.frame(rate_crime_data)
row.names(rate_crime_data) <- rate_crime_data$district 

rate_crime_data<- rate_crime_data %>% 
  select(-c(2))

rate_crime_data.std <- normalize(rate_crime_data)

Using dist() function of the stats package, I derived the proximity matrix.

rate_crime_data_numeric <- rate_crime_data %>% select(where(is.numeric))
rate_crime_data_numeric <- rate_crime_data_numeric %>% drop_na() 
proxmat <- dist(rate_crime_data_numeric, method = 'euclidean')

Next, I used hclust(), also from the stats package to derive the dendrogram. rect.hclust() was also used to highlight the clusters.

hclust_ward <- hclust(proxmat, method = 'ward.D')
plot(hclust_ward, cex = 0.6)
rect.hclust(hclust_ward, 
            k = 4, 
            border = 2:5)

I also used the sample code below to derive an interactive heatmap of the clusters.

rate_crime_data.std <- rate_crime_data.std %>%  drop_na()
heatmaply(rate_crime_data.std,
          Colv=NA,
          dist_method = "euclidean",
          hclust_method = "ward.D",
          seriate = "OLO",
          colors = Purples,
          k_row = 6,
          margins = c(NA,200,60,NA),
          fontsize_row = 4,
          fontsize_col = 5,
          main="Geographic Segmentation of Malaysia by Crime Type",
          xlab = "Crime Type",
          ylab = "Districts"
          )

10.4.1 Shiny Board

The user can input the year, the method used in computing the proximity matrix, as well as the hierarchical clusters, and also have the option to choose how the clusters differe with different number of “optimal clusters”.

The “Generate Dendrogram” button will generate the dendrogram.

The “Generate Heatmap” button will generate an Interactive Heatmap.